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Abstract 

A conditional independence graph is a concise representation of pairwise con- 
ditional independence among many variables. We propose Graphical Ran- 
dom Forests (GRaFo) for estimating pairwise conditional independence re- 
lationships among mixed-type, i.e. continuous and discrete, variables. The 
number of edges is a tuning parameter in any graphical model estimator and 
there is no obvious number that constitutes a good choice. Stability Selec- 
tion helps choosing this parameter with respect to a bound on the expected 
number of false positives (error control). 

We evaluate and compare the performance of GRaFo with Stable LASSO 
(StabLASSO), a LASSO-based alternative, across 5 simulated settings with 
p = 50, 100, and 200 variables, and we apply GRaFo to data from the Swiss 
Health Survey in order to evaluate how well we can reproduce the inter- 
connection of functional health components, personal, and environmental 
factors, as hypothesized by the World Health Organization's International 
Classification of Functioning, Disability and Health (ICF). 

GRaFo performs well with mixed data and thanks to Stability Selection 
it provides an error control mechanism for false positive selection. 
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In many problems one is not confined to one response and a set of pre- 
defined predictors. In turn, the interest is often in the association structure 
of a whole set of p variables, i.e. asking whether two variables are indepen- 
dent conditional on the remaining p-2 variables. A conditional independence 
graph (CIG) is a concise representation of such pairwise conditional indepen- 
dence among many possibly mixed, i.e. continuous and discrete, variables. In 
CIGs, variables appear as nodes, whereas the presence (absence) of an edge 
among two nodes represents their dependence (independence) conditional on 
all other variables. Applications include among many others also the study 



et al. (2011)). 



,pf fun ctional health (Strobl et al. (2009); Kalisch et al. (2010); Reinhardt 



We largely focus on the high- dimensional case where the number of vari- 
ables (nodes in the graph) p may be larger than sample size n. A popu- 
lar approach to graphical modeling is based on the Least Absolute Shrink- 



age and Selecti on Operator (LASSO, Tibshirani (1996)): see Meinshausen 



and Biihlmann (2006) or Friedman et al. (2008) for the Gaussian case and 
Ravikumar et al. (2010) for the binary case. However, empirical data of- 
ten involve both discrete and continuous variables. Conditional Gaussian 
distributions were suggested to model such mixed-type data with maximum 



likelihood inference (Lauritzen and Wermuth (1989)), but no corresponding 
high-dimensional method has been suggested yet. Dichotomization, though 



always applicable, comes at the cost of lost information (MacCallum et al. 



(2002)). 



Tree-based methods are easy to use and accurate for dealing with mixed- 



type data (Breiman et al. (1984)). Random Forests (Breiman (2001)) eval- 



uate an ensemble of trees often resulting in notably improved performance 
compared to a single tree (see also Amit and Geman (1997)). Furthermore, 



permutation importance in Random Forests allows to rank the relevance of 
predictors for one specific response. Since the definition of permutation im- 
portance differs for discrete and continuous responses, ranking permutation 
importances across responses of mixed-type is less obvious. However, such 
ranking is essential to derive a network of the most relevant dependencies. 
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34 Stability Selection proposed by Meinshausen and Biihlmann (2010) is one 



35 possible framework to rank the edges in the CIG across different types of 

36 variables. In addition, it allows to specify an upper bound on the expected 

37 number of false positives, i.e. the falsely selected edges, and thus provides a 

38 means of error control. 

39 We combine Random Forests estimation with appropriate ranking among 

40 mixed-type variables and error control from Stability Selection. We refer to 

41 the new method as Graphical Random Forests (GRaFo). The specific aims 

42 of the paper are a) to evaluate and compare the performance of GRaFo with 

43 Stable LASSO (StabLASSO), a LASSO-based alternative, across 5 simulated 

44 settings comprising different distributions for p = 50, 100, and 200 possibly 

45 mixed-type variables while sample size is n = 100, and b) to apply GRaFo 

46 to data from the Swiss Health Survey (SHS) to evaluate the interconnec- 

47 tion of functional health components, personal, and environmental factors, 

48 as hypothesized by the World Health Organization's (WHO) International 

49 Classification of Functioning, Disability and Health (ICF). 



50 2. Graphical Modeling Based on Regression- Type Methods 

51 2.1. Conditional Independence Graphs 

52 Let X = {X%, . . . , X p } be a set of (possibly) mixed-type random variables. 

53 The associated conditional independence graph of X is the undirected graph 

54 G C iG = (V, £(G CX q)), where the nodes in V correspond to the p variables in 

55 X. The edges represent the pairwise Markov property, i.e. i - j £{G CIG ) if 

56 and only if Xj J_ Aj|X \ {Xj, X^}. For a rigorous introduction to graphical 



57 



models, see, for example, the monographs by Whittaker (1990) or Lauritzen 



58 (1996). 



59 We will now show that the pairwise Markov property can, under certain 

60 conditions, be inferred from conditional mean estimation. 

Theorem 1. If Xj is a continuous random variable, assume that the con- 
ditional distribution of Xj given X \ {Xj} has a conditional density with 
respect to Lebesgue measure, and assume that E[X,|X \ {X,-}] exists for all 
j. Furthermore, assume that the conditional density (for continuous random 
variable) or the conditional point probability (for discrete random variable) 
of Xj\X. \ {Xj} is of the form 

f( Xj \X s {X,}) = f( Xj \h(E[X,\X x {Xj}])) VX x {Xj}, \/ Xj (CI) 
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ei for some invertible function h. Then Xj ± Xj|X \ {Xj,Xi} if and only if 

62 E[Xj|X \ {Xj}] is a function not depending on Xj (i.e. E[Xj|X \ {Xj}] = 

63 ?:[.V ; |X \ {.V r .V,]];. 

A proof is given in Section[7j Assumption (CI) trivially holds for a binary 
random variable Xf for Xj = 1 

f(xj = 1|X \ {Xj}) = P[Xj = 1|X \ {Xj}] = E[Xj\X \ {Xj}] 



and hence the function h is the identity. Analogously, for a multinomial 



random variable Xj with C categories: for Xj = (x^ , . . . , xy* ) and r 



1 C: 



f r ( Xj \X x {Xj}) = {*f(X x {A,})}4 r) {l - ^(X x {X,})} 1 -" 

withvr J (r) (X\{X J }) =P[xj r) = 1|X\{X,}] =E[xj r) |X\{Xj}]. Hence, (CI) 
holds with the identity function h. Moreover, if (Xi, . . . ,X P ) ~ A/^(0,S), 



then (CI) holds as well (see for example Lauritzen (1996)). However, for the 
conditional Gaussian case, we need to require for (CI) that the variance is 
fixed and is not depending on the variables we condition on. For example, 
let Xi ~ B(l,ir) be Bernoulli distributed and let 

64 Then the distribution of X2IX1 is not a function of the conditional mean 

es alone. 

ee Theorem 1 motivates our approach to infer conditional dependences, or 
67 edges in the CIG, via variable selection for many nonlinear regressions, i.e. de- 
es termining whether a variable Xj is relevant in E[Xj|X \ {Xj}] (regression of 

69 Xj versus all other variables). 

70 2.2. Ranking Edges 

71 In order to determine which edges should be included in the graphical 

72 model, the edges suggested by the individual regressions need to be ranked 

73 such that a smaller rank indicates a better candidate for inclusion. Depending 

74 on the assumed distribution of the responses and predictors, different mea- 

75 sures are available. For instance, in ordinary least squares regression, where 

76 all variables are continuous, the size of the standardized regression coeffi- 

77 cients is an obvious global ranking criterion. Analogously, when all variables 
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78 are binary, coefficients from linear logistic regression lead to a global ranking. 

79 Note that each edge i-j is associated with two coefficients (Xj regressed on 
so Xi and all other variables and vice versa for Xi on Xj). To be conservative, 
si we rank each edge i-j relative to the smaller one of the two (absolute- valued) 

82 ranking coefficients. 

83 If variables are mixed-type, a global ranking criterion might be impossible 

84 to find: Gaussian and non-Gaussian variables are not directly comparable. 

85 Instead, local rankings for each regression are performed separately ( "local" 
se means that we can rank the importance of predictors for every individual 
87 regression). Analogous to global ranking, each edge i-j is associated with 
as two possible ranks and the worse among them is used. 

89 When using Random Forests for performing the individual nonlinear re- 

90 gressions, the ranking scheme is obtained from Random Forests' variable 

91 importance measure. When using the LASSO for individual linear or logistic 

92 regressions, the ranking scheme is obtained from absolute values of estimated 

93 regression coefficients. 

94 We then have to decide on the number of edges to select, i.e. the tuning 

95 parameter. Assume it is given as q thI . Then for both global and local rankings 

96 we select the q thI best-ranked edges across all p individual regressions. If this 

97 is impossible due to tied ranks, we neglect these tied edges and select only 

98 the remainder of edges not in violation of the threshold. 

99 We next outline how Stability Selection can be used to guide the choice 
of g th r- 



100 



2.3. Aggregating Edge Ranks with Stability Selection 

Stability Selection ( Meinshausen and Buhlmann ( 2010[ )) allows the spec- 



ification of an upper bound on the expected number E,[V] of false positives. 



It is based on subsampling (Politis et al. (1999); Buhlmann and Yu (2002)) 



random subsets XW, . . . , X( nsub ) of the original sample Xi,...,X n , where 
each X( fc ) contains [n/2j sample points. Let £ (G CIG (X( fc ))) denote the edges 
from a thresholded ranking based on X( fc ), k = 1, . . . ,n aub . Stability Selection 
suggests to construct £((j cig (X)), the set of all edges in the estimated CIG 
of X, from all edges that were "sufficiently stable" across the n suh subsets. 
More concretely, we choose only edges i-j which fulfill 



n. 



^ "sub 

E J {i-je£(G C T G (XM))} ^ ftto, (!) 



sub k=l 
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where 7r thr e (§, l) imposes a threshold on the minimum relative frequency of 
edges across the n sub subsets to be included in £(G CIG (X)). 



In their Theorem 1, 



Meinshausen and Buhlmann (2010) relate E[V] to 

is the tuning 



the maximum number of selected edges q thI per subset (i.e. q thT 
parameter for thresholding the ranked edges), the number of possible edges 
p-(p- l)/2 in £(G CIG (X)), and the threshold 7r thr from equation (111): 



E[V] < 



(27r to -l).p.(p-l)/2" 



(2) 



Both E[V] and Tr thI need to be specified a priori in order to determine Q tbI . 
As Meinshausen and Buhlmann (2010) argue, the choice of 7r thr is of minor 
importance for a given E[V], since a larger 7r thr will mediate a larger q thI , and 
vice versa. We can thus use equation ^ to derive 

g thr = W(2ir thI -l)E[V]-p-(p-l)/2\. 

Henceforth, we fix n sub = 100 and 7r thr = 0.75, and vary E[V] as required. 

Note that equation ^ is based on two assumptions: 1) the estimation 
procedure is better than random guessing and 2) the probability of a false 
edge to be selected is exchangeable, i.e. each false edge is equally likely to be 



we selected; for details we refer to Meinshausen and Buhlmann (2010) 



109 3. Random Forests and LASSO Regression 



in 
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3.1. Random Forests 

Random Forests have, to date, not been used to estimate CIGs. They 
perform a series of recursive binary partitions of the data and construct the 
predictions from terminal nodes. Based on classification and regression trees 
(Breiman et al. ( 1984[ )) they allow convenient inference for mixed-type vari- 
ables, also in the presence of interaction effects. Incorporating bootstrap 



man 



(Efron (1979); Breiman (1996)) and random feature selection (Amit and Ge 



(1997)), random subsets of both the observations and the predictors are 



considered. The relevance of each predictor can be assessed with permuta- 



tion importance (Breiman (2002)), a measure of the error difference between 



a regular Random Forests fit and a Random Forests fit within which one 
predictor has been permuted at random to purge its relationship with the 
Respon se. An implementation of Random Forests in R (R Development Core 



Team (2011)) is available in the randomForest package (Liaw and Wiener 
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(2002)). Since the goodness-of-fit of continuous and categorical responses is 
based on mean squared errors and majority votes, respectively, the goodness- 
of-fit and importance measures are not directly comparable across mixed-type 
responses. Thus a local ranking is derived, where each edge i-j is assigned 
either the rank of the permutation importance of predictor X*> for response 



xj or of predictor X)j K) for response X\ K} (whichever is more conservative, 
i.e. assigns a worse rank) and finally aggregated with Stability Selection; the 
upper index ( fc ) denotes the k th subsample in Stability Selection. We refer to 
this procedure as Graphical Random Forests (GRaFo) henceforth. 

3.2. Least Absolute Shrinkage and Selection Operator (LASSO) 

In the case of linear regression for continuous responses and predictors, 
the LASSO (Tibshirani (1996)) penalizes with the £i-norm and correspond- 



(k) 



ing penalty parameter A the coefficients of some less relevant predictors to 
zero. The larger A is chosen, the more coefficients will be set to zero. This 



concept has also been extended to logistic regression (Lokhorst (1999)) and 



implemented in R in the glmnet package (Friedman et al. (2010)). In the case 
of multinomial and mixed-type data, no eligible off-the-shelf implementations 
of the LASSO were available. We hence dichotomize these data according 
to a median split for continuous variables and aggregate categories such that 
the resulting frequency of the -1 and 1 categories was as balanced as possible 
for discrete variables. 

CIG estimation via the LASSO with Stability Selection was suggested for 



Gaussian data by Meinshausen and Buhlmann (2010) and can be represented 
as a global ranking. For each response X^ , we estimate LASSO regressions 
with all remaining X( fc ) \ {X ■ } as predictors and with a decreasing sequence 



149 of penalties A^- 



(*),» 



A 



(AO* 



(fc) 



Let A^; 



denote the largest penalty value of 



150 the sequence for which the coefficient of predictor X^ for response Xj K) is 

151 non-zero, and if no such penalty exists let A^ = 0. For each edge i-j we 



it 



select the more conservative penalty A^ fc j = min (A^ fe \ A^j and rank i-j 
relative to the global rank from the absolute-valued estimated regression 
coefficient corresponding to A^j. As before, the upper index ( fc ) denotes 
the k th subsample from Stability Selection. We denote this procedure in 
combination with Stability Selection as Stable LASSO (StabLASSO). 
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157 4. Simulation Study 
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4-1. Simulating Data from Directed Acyclic Graphs 
We use a directed acyclic graph (DAG, cf. 



Whittaker (1990)) to embed 



conditional dependence statements among nodes representing the p random 
variables. The associated CIG follows by moralization, i.e. connecting any 
two parents with a common child that are not already connected and remov- 



ing all arrowheads (Lauritzen and Spiegelhalter (1988)) 



Let A be a (pxp)-dimensional weight matrix with entries e {[-1, -0.1] u 
{0} u [0.1,1]} if i < j and ctij = otherwise. In addition, we sample A to 
be sparse, i.e. we expect only one percent of its entries to deviate from 0. 
The non-zeros in A encode the directed edges in a DAG we simulate from 
similarly as in Kalisch and Biihlmann (2007); see also Table [I] Furthermore, 



and Vij be vectors that we use to 



for all i, j € {1, . . . ,p} with eiy + let Uij 
impose some additional structure on multinomial variables: 1) at least one 
category of a multinomial predictor should have an effect opposite to the 
remainder, 2) the (total) effect of the categories of a multinomial predictor 
Xi should be positive on some categories of a multinomial response Xj and 



negative on others. For this purpose, we restrict = (u 
,(i) JCjh 



(i) 

ij 



. U 



(Ci) 



) and 



e {-1, 1} V/ = 1, . . . , Q s.t. - d < £ < Q 



Ci 



.(*) 



i=i 

d 



(I) 



€{-1,1} V S = l,...,Q S .t. ()<Y,<V< ( 'j- 



164 With these definitions, we sample data from different distributions us- 

165 ing the inverse link function to relate the conditional mean to all previously 
we sampled predictors. Table [T] describes the settings in detail, covering models 
167 with purely Gaussian, purely Bernoulli, purely multinomial, and an alternat- 
es ing sequence of Gaussian and multinomial variables ("mixed" setting). 

169 Jf..2. Simulating Data from the Ising Model 

A common approach to model pairwise dependencies between a set of 
binary variables is the Ising model with probability function 



p(x, 0) = exp (Yj OaXi + 



IjXiX j 



r(e)) 



(3) 
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Distribution Model 



Conditional Mean 



Gaussian Xj ~ Af(fij, a 2 - 1) fij = Y,i<j a>ijXi 

Bernoulli X, = 2X 3 - 1, vr, = ^jg^ 

Multinomial X } ~ Mfo = (tt™, . . . , Trf '>)), Trf } = -^^L> 

E r iiexp(^ ; ) 

^ ) = E^^ ) a ij Ea^ ) (27 M -l), 
~W{3,4,5}, s = l,...,C j 

MlX6d L^--f7r (1) vr^)) else tt w - ^ 

W'-^i '■■■' 7r i )} ' 6186 ^ "E^exp(^) 

- Li:i<j A ±{N V ij a ij X ij + 

+ Zi:i<jAlm v lj )a ij E£i ^C 27 ^} - 1) 
~W{3,4,5}, s = l,...,C j 



Table 1: The table shows the four simulation models based on DAGs. Af, B, .M,andW 
are the Gaussian, Bernoulli, multinomial, and discrete uniform distribution, respectively. 
Initial values for X\ are sampled with [i\ = 0, 7Ti = ^ , and 7Ti = (^-,...,^-), respectively, 
where d ~ W{3,4,5}. The weights chosen from {[-1, -0.1] u {0} u [0.1, 1]} to 

determine the dependence relationships among the random variables. The scalars uf^ 

and vf) are chosen from {-1,1} to impose additional structures on multinomial random 
variables. 
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for realizations xeX, normalization constant T(B), and (p x p)-dimensional 
symmetric parameter matrix = {%}ij6{i,...j>}- From the conditional den- 
sities of equation ^ if follows that % = (% + 0) implies the absence 



(presence) of edge i - j in the associated CIG. See also |Ravikumar et al 

( polo) 



We sample the diagonal and the upper-triangular matrix of G uniformly 
from {-1,0,1} such that the average neighborhood size for each node equals 
4. The lower-triangular matrix equals its upper counterpart. We use the 



Gibbs sampler (cf., Givens and Hoeting (2005)) to sample realizations from 
equation (|3]). Honing and Tibshirani (2009) provide an implementation in 
the BMN package in R. 

4-3. Simulation Results 

For p € {50, 100,200} variables and samples of size n = 100, each of the 5 
simulation models was averaged over 50 repetitions. The results are shown 
in Figures [T]j6} Error control for small bounds on the expected number of 
false positives E[V] could be achieved for both GRaFo and StabLASSO in 
all but the mixed setting with p = 200 in Figure [6j 

In the Gaussian, Bernoulli and Ising settings, StabLASSO seems to per- 
form slightly better than GRaFo for small error bounds and rather similar 
across the figures for the true/false positive rates (third column of Figures 
[l]{6]). Note that StabLASSO sets many coefficients to 0. As a consequence, 
a large proportion of edges cannot be selected for false positive rates smaller 
than 1 resulting in some StabLASSO curves not covering the entire range of 
the rates. 

In the multinomial and mixed setting, GRaFo returned satisfactory re- 
sults whilst dichotomization resulted in a dramatic loss in performance of 
StabLASSO (cf., [MacCallum et al.| p002j); |Altman and Royston| (|2006); 



Royston et al. (2006)). In general, both procedures seem to perform best 



in the Gaussian setting, followed by the mixed, multinomial, Bernoulli, and 
Ising setting, respectively. The latter seems especially hard for both pro- 
cedures if the upper error bound in equation ^ for E[V] is chosen small. 
Nevertheless, given one's willingness to expect more errors, the rate figures 



[Indicate th e potential to recover (parts of) the true structure (cf., Raviku- 
mar et al. (2010); Hofling and Tibshirani (2009)). The "raw" counterparts, 



Random Forests and LASSO, seem to perform quite similar to GRaFo and 
StabLASSO across all settings, but they do not provide any error measure 
or guidance how to regularize the procedure. 
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A violation of condition (CI) of Theorem 1 in the mixed setting could 
explain the failure of both GRaFo and StabLASSO to achieve error control 
for p - 200. However, both the mixed setting with p = 50 and p = 100 returned 
very few observed errors and remained well below the error bounds indicating 
the problematic behavior may be linked to larger values of p. Also, for any 



setting it is unlikely that the exchangeability assumption holds. Meinshausen 



and Buhlmann (2010) argue that Stability Selection appears to be robust to 



violations, but did not study mixed data which may be particularly affected. 

The computational cost is growing rather quickly with growing p. The 
runtime of a single of the 50 repetitions per setting is in the order of 15 
minutes for GRaFo and 20 minutes for StabLASSO for p = 50 and increases 
to several hours for GRaFo and 30 minutes for StabLASSO in the case of 
p = 200. Each batch of 50 repetitions was run in parallel on 50 cores of 
the BRUTUS high-performance cluster comprising quad-core AMD Opteron 
8380 2.5 Ghz CPUs with 1 GB of RAM per core using the Rmpi package 



(Yu (2010)) available in R. 



223 5. Functional Health in the Swiss General Population 

224 5.1. The Importance of Functional Health 

According to the World Health Organization's (WHO) new framework of 
the International Classification of Functioning, Disability and Health (ICF, 
WHO| Q2001fl ) the lived experience of health QStucki et aL] Q2008D ) can be 
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structured in experiences related to body functions and structures as well as 
to activity and participation in society. All of these are, in turn, influenced 
by a variety of so-called personal factors such as gender, income, or age and 
environmental factors including individual social relations and supports as 
well as properties of larger macro social systems such as the economy (see 
Figure [7]). Also, the WHO and The World Bank recommend in their recent 
World Report on Disability (2011) that functional health state descriptors 
are analyzed in conjunction with other health outcomes and, particularly, 
that more research is conducted on "[...] the interactions among environ- 
.mental factors, health conditions, and disability [...]" (WHO and The World 
Bank (2011) p. 267). Under these prerequisites it is of interest which vari- 



ables are conditionally dependent on each other. For instance, "Does the 
income distribution affect participation, conditional on known impairments, 
environmental, and personal factors?". 
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Figure 1: The rows correspond to the Gaussian, Bernoulli, and Ising model with p = 50. 
Their true CIGs have 16, 16 and 89 edges, respectively. The first two columns report the 
observed number of true and false positives ("o") relative to the bound in eqn. (2| for the 
expected number K[V] of false positives ("]") for GRaFo and StabLASSO, respectively, 
averaged over 50 simulations. The third column reports the averaged true and false positive 
rates of GRaFo and StabLASSO relative to the performance of their "raw" counterparts 
without Stability Selection. 
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Figure 2: The rows correspond to the Gaussian, Bernoulli, and Ising model with p = 100. 
Their true CIGs have 58, 58 and 182 edges, respectively. The first two columns report the 
observed number of true and false positives ("o") relative to the bound in eqn. (2| for the 
expected number K[V] of false positives ("]") for GRaFo and StabLASSO, respectively, 
averaged over 50 simulations. The third column reports the averaged true and false positive 
rates of GRaFo and StabLASSO relative to the performance of their "raw" counterparts 
without Stability Selection. 
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Figure 3: The rows correspond to the Gaussian, Bernoulli, and Ising model with p = 200. 
Their true CIGs have 334, 334 and 369 edges, respectively. The first two columns report 
the observed number of true and false positives ( "o" ) relative to the bound in eqn. §} for 
the expected number E[V] of false positives ("]") for GRaFo and StabLASSO, respectively, 
averaged over 50 simulations. The third column reports the averaged true and false positive 
rates of GRaFo and StabLASSO relative to the performance of their "raw" counterparts 
without Stability Selection. 
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Figure 4: The rows correspond to the multinomial and mixed-type model with p = 50. 
Their true CIGs both have 16 edges. The first two columns report the observed number 
of true and false positives ("o") relative to the bound in eqn. ^ for the expected number 
E[y] of false positives ("]") for GRaFo and StabLASSO, respectively, averaged over 50 
simulations. The third column reports the averaged true and false positive rates of GRaFo 
and StabLASSO relative to the performance of their "raw" counterparts without Stability 
Selection. 
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Figure 5: The rows correspond to the multinomial and mixed-type model with p = 100. 
Their true CIGs both have 58 edges. The first two columns report the observed number 
of true and false positives ("o") relative to the bound in eqn. ^ for the expected number 
K[V] of false positives ("]") for GRaFo and StabLASSO, respectively, averaged over 50 
simulations. The third column reports the averaged true and false positive rates of GRaFo 
and StabLASSO relative to the performance of their "raw" counterparts without Stability 
Selection. 
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Figure 6: The rows correspond to the multinomial and mixed-type model with p = 200. 
Their true CIGs both have 334 edges. The first two columns report the observed number 
of true and false positives ("o") relative to the bound in eqn. ^ for the expected number 
K[V] of false positives ("]") for GRaFo and StabLASSO, respectively, averaged over 50 
simulations. The third column reports the averaged true and false positive rates of GRaFo 
and StabLASSO relative to the performance of their "raw" counterparts without Stability 
Selection. 
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Figure 7: The International Classification of Functioning, Disability and Health (ICF) 
model relates aspects of human functioning and provides a common language for practi- 
tioners. 
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5.2. Study Population 

We use GRaFo for a secondary analysis of cross-sectional observational 
data on functional health from the Swiss Health Survey (SHS) in 2007. Data 
were obtained from the Federal Statistics Office of Switzerland. The original 
study was based on a stratified random sample of all private Swiss households 
with fixed line telephones. Within each household one household member 
aged 15 or older was randomly selected. The survey was completed by a 
total of 18760 persons, corresponding to a participation rate of 66 percent 



(Graf (2010)). The mean age of study participants was 49.6 years (±18.5). 
The data were mostly collected with computer assisted telephone interviews. 
Further information is available elsewhere (Storni ( 2011[ )). 



5. 3. Variables 

The SHS included various information on symptoms (in particular pain), 
impairments, and activity limitations. Since the respective items were some- 
times nominal, sometimes ordinal, and sometimes (e.g. body mass index) 
metric, we dichotomized each item so that 1 was indicative of having any 
kind of problem. As overall summary scores on functioning and disability 
were not recommendable (Reinhardt et al. ( 2010[ )), we followed the frame- 
work of the WHO's biopsychosocial model of health, outlined in the ICF 
iWHO (2001), see Fig ure [7| , and other theoretical considerations (WHO 



and The World Bank (2011); Reinhardt et al. (2010)) in constructing sum 
indices (see Table |2j) . The plausibility of all indices was checked using the 



Stata 11 confirmatory factor analysis module confa (Kolenikov (2009)). In 



each case the index construction was tested and the null hypothesis of a 
diagonal structure of the covariance matrix rejected. 

We created a dummy variable for labor market participation restrictions 
such that 1 identified persons who gave up work, reduced the number of 
working hours, or changed jobs because of health reasons. We also created a 
dummy variable for participation in leisure physical activity (LPA) differen- 
tiating between people participating in leisure activities leading to sweating 
at least once a week and those who do not. General health perception was 
measured with the following question and answer options: "How would you 
rate your health in general? Very good, good, fair, poor, or very poor?". 
We further included indicators of socio-economic status (SES) in our anal- 
ysis: equivalence household income, years of formal education, employment 
status, and migration background (foreign origin of at least one parent). On 
the macro- or cantonal-level we obtained information on the Swiss counties' 
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Construct 


Variable specification 


Coding 


for 'y es ' 


Impairment 


Problems with vision 


1 


(any 


problem) 




Problems with hearing 


1 


(any 


problem) 




Problems with speaking 


1 


(any 


problem) 




Bocly mass index over 30 or under 16 


1 








Urinary incontinence 


1 


(any 


problem) 




Defecation problems 


1 


(any 


problem) 




Feeling weak, tired, lack of energy 


1 


(any 


problem) 




Sleeping problems 


1 


(any 


problem) 




Tachycardia 


1 


(any 


problem) 




Range of sum index: 0-9 








Pain 


Pain in head 


1 


(any 


pain) 




Pain in chest 


1 


(any 


pain) 




Pain in stomach 


1 


(any 


pain) 




Pain in back 


1 


(any 


pain) 




Pain in hands 


1 


(any 


pain) 




Pain in joints 


1 


(any 


pain) 




Range of sum index: 0-6 








Activity 


Problems with walking 


1 


(any 


problem) 


limitation 


Problems with eating independently 


1 


(any 


problem) 




Problems with independently getting up from bed or chair 


1 


(any 


problem) 




Problems with dressing independently 


1 


(any 


problem) 




Problems with independently using the toilet 


1 


(any 


problem) 




Problems with independently taking a shower or bath 


1 


(any 


problem) 




Problems with independently preparing meals 


1 


(any 


problem) 




Problems with independently using a telephone 


1 


(any 


problem) 




Problems with independently doing the laundry 


1 


(any 


problem) 




Problems with independently caring for finances/accounting 


1 


(any 


problem) 




Problems with independently using public transport 


1 


(any 


problem) 




Problems with independently doing major household tasks 


1 


(any 


problem) 




Problems with independently doing shopping 


1 


(any 


problem) 




Range of sum index: 0-13 








Social support 


Feeling lonely 











Missing someone to turn to 











Having at least one supportive family member 


1 








Having someone to turn to 


1 








Range of sum index: 0-4 








Social network utilization 


At least weekly visits of family 


1 








At least weekly phone calls with family 


1 








At least weekly visits of friends 


1 








At least weekly phone calls with friends 


1 








At least weekly participation in clubs/associations/partics 


1 







Range of sum index: 0-5 



Table 2: Construction rules of sum indices for functioning (pain, impairment, activity 
limitation) and social integration (social support and social network utilization) from 37 
dichotomous variables. 



(cantons) gross domestic products (GDP), Gini coefficients, and crime rates 
for 2006. Moreover, we considered information on gender, age, marital sta- 
tus (being married), alcohol consumption (in grams per day), and current 
smoking (yes/no). 

Of these, in total, 20 mixed- type variables (see Table [3]), income had the 
highest number of missing values with roughly 6 percent. Overall, less than 
0.85 percent of replies were missing corresponding to 2687 cases with one 
or more missing values. To assess their effect, we estimated the CIG once 
with casewise deletion and once with imputation of missing values with the 
missForest procedure (Stekhoven and Buhlmann (2011)) available in R. 
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Type 


Variable 


% Missing 


> 2 categories 


Impairment index 


5.92 




Pain index 


0.37 




Activity limitation index 


0.69 




Social support index 


5.84 




Social network utilization index 


2.32 




General health perception 


0.05 


Dichotomous 


Male 


0.00 




Married 


0.09 




Paid work 


0.03 




Migration background 


4.73 




Smoker 


0.07 




Work restriction 


0.00 




Leisure physical activity 


0.00 


Continuous 


Age 


0.00 




Years of formal education 


0.07 




Income 


5.94 




Alcohol consumption (in grams per day) 


2.59 




Gross domestic product 


0.00 




Gini coefficient 


0.00 




Crime rate 


0.00 



Table 3: List of all 20 variables used in the CIG estimation, their type, and their percentage 
of missing values. 
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Figure 8: Conditional independence graph of the p = 20 variables (nodes) remaining after 
construction of indices based on the 2007 Swiss Health Survey estimated with GRaFo. 
Edges were selected with respect to an upper bound of 5 on the expected number of false 
positives, see equation ([2]). Five nodes (social network utilization, migration background, 
smoker, work restriction, and LPA) were isolated (no edges) and thus neglected. 



5.4- Research Hypothesis 



290 From the WHO's ICF model (WHO (2001), see Figure 78, we hypothe 



sized that all variables on functional and general health perception, and all 
variables on social status, networks, and supports were connected via paths 
within the same component of the CIG. 

5. 5. Findings 

Figure [8] shows the resulting graph from our application of GRaFo to the 
(non-imputed) data on functional health from the SHS with casewise deletion 
of missing values regularized for a bound (as in equation (J2|) for an expected 
number of false positives K[V] < 5. The selected edge sets for the imputed 
and casewise deleted data were quite similar for various bounds on K[V] and 
even identical for K[V] < 5 (not shown). In the following, we thus focus 
on the CIG derived from the complete observations remaining after casewise 
deletion of missing values. 
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The resulting edges for K[V] < 1 depict relatively obvious associations 
known from everyday observations. Interestingly, general health perception 
is conditionally dependent on activity limitation but conditionally indepen- 
dent of impairment and pain. In the larger graph for E[V] < 5, one sees that 
general health perception, impairments, and pain are connected through a 
path of several environmental and personal factors such as social support, 
being married, age, etc. That implies, for instance, that we do not need 
information on impairment to predict general health perception if we have 
information on activity limitation and the remaining predictors, whereas ac- 
tivity limitation is an essential predictor of general health perception even 
if information on all the remaining predictors is provided. For instance, a 
person with a spinal cord injury who has no activity limitation because of 
social and technological supports, could thus still report good health. This 
finding is supported by other sources reporting that many people with dis- 



abiliti es do not consider themselves to be unhealthy (WHO and The World 



Bank (2011); Watson (2002)). In the 2007-2008 Australian National Health 



319 Survey, 40 percent of people with a severe or profound impairment rated 



their health as good, very good, or excellent (Australian Bureau of Statistics 
02009) )■ 

As regards our hypothesis derived from the ICF model QWHO| 02001) ), 
we can confirm that the bulk of individual level variables form one com- 
ponent and support the biopsychosocial model of health: Functional and 
general health influence each other and are connected with a variety of en- 
vironmental and personal factors. However, not all candidate personal and 
environmental factors were related in our study. This may be due to our 
conservative upper bound on the error that is likely to favor false negatives, 
i.e. missing edges. There may also be an issue with our selection of variables 
that was restricted by the choices of the original survey team. In particular, 
macro-level variables pertaining information about the counties, in which the 
individuals are nested, form a second component. It may be that their effect 
is already contained in the individual-level variables, for example paid work. 
Five variables do not appear in the graph entirely: social network utilization, 
migration background, smoker, work restriction, and LPA. If we remove the 
three macro-level variables GDP, Gini, and crime rate from the model, the 
connectivity of the individual-level component does not change. Instead, the 
two variables migration background and social network utilization are now 
present as a separate component (not shown). 

Unfortunately, lack of information on the directions of relationships is a 
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weakness of CIGs. Also, condition (CI) of Theorem 1 and the exchangeability 
condition have likely been violated. Regardless, given the high face validity 
of the findings and the achievement of error control in the mixed setting for 



344 small p in Section |4.3[ the results seem satisfactory. 

345 The runtime of GRaFo depends also on n, even if p is small. Hence, 

346 estimation of the SHS graph was executed in parallel on 10 cores of the 

347 BRUTUS cluster with a runtime of roughly 8 hours. 

348 6. Conclusion 

349 We propose GRaFo (Graphical Random Forests) performed satisfactory, 

350 mostly on par or superior to StabLASSO, LASSO, and Random Forests. 

351 Error control could be achieved in all but the mixed-type simulation with p = 

352 200. Violation of assumption (CI) in Theorem 1 and of the exchangeability 

353 condition might be responsible for this behavior. In contrast, in most of 

354 the other settings GRaFo was very conservative and observed errors were 

355 well below their expected upper bound. The Ising model, the sole model 

356 not based on DAGs, was particularly hard for both GRaFo and StabLASSO 

357 resulting in few true positives if error bounds were chosen very small. 

358 Poor results for the LASSO in the multinomial and mixed case may be 

359 improved by feasible modifications of the LASSO, such as an extension of 



360 the group LASSO (Meier et al. (2008)) to multinomial responses. However, 



361 penalization among different types of variables (including the issue of scaling) 

362 is not a straightforward task. 

363 The Swiss Health Survey graph consists of an individual- and a macro- 

364 level variable cluster which were highly stable with respect to the way of 

365 handling missing values. Exclusion of the macro-level cluster did not affect 

366 the individual-level cluster. For a small error bound, our hypothesis that 

367 all factors should connect could not be fully confirmed, though a strong 

368 tendency toward the ICF's biopsychosocial model of health was evident in 

369 the individual-level cluster. 

370 7. Proof of Theorem Q] 

We have to show that for the density or point probability /: 

Xj JL Xi\X \ {Xj,Xi} o E[Xj\X \ {Xj}] = E[Xj\X \ {X^XJ]. (4) 
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We first show "=►" of eqn. Q: 

X J ±X l |Xx{X J ,X i } 
f(x j \Xs{X j })=f(x j \Xs{X j ,X i }) VXN^.Vxj 

^ J x jf( x j\X- \ = J Xjf(xj\X \ {X,-, VX \ {Xj} 

E[X,|Xx{X J }]=E[X J |Xx{X„X l }] VXs {*,-}. 

We now show "<=" of eqn. Mj): 

E[X,|X x {X,}] = E[X,|X x {X„X}] VX x {X,} 
=*(ci) f(x j \Xs{X j })=f(x j \Xs{X j ,X i }) VXsjl^V^ 
=> Xj ±X i \X\{X j ,X i }. 

371 Thus equation Q holds. 
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